In the recently released Sphinx version 0.9.9-rc2 there is a support for MySQL wire protocol and SphinxQL – SQL-like language to query Sphinx indexes. This support is currently in its early preview stage but it is still fun to play with.

A thing to mention – unlike MySQL Storage Engines, some of which as InfoBright or KickFire take over execution after parsing, Sphinx MySQL support has nothing to do with MySQL – it is implementation of the wire protocol from scratch.

For this test I was not interesting in the full text search performance, we already know Sphinx is much faster than MySQL build in full text search. I was rather interested to look performance of other queries, not using Full Text Search.

For the tests I used the table from the forum search engine, leaving just bunch of ids in it, removing everything else:

This table contained some 25 millions of rows and no indexes there defined – Sphinx does not support explicit indexes and it is clear when you can use index for sort MySQL will be a lot faster.

First – Sorting. Sphinx is smart doing sorting because it does not try to sort everything but if you ask but rather only number of rows it needs to reach the LIMIT

Sphinx

MySQL

As you can see Sphinx adds couple of extra columns to result set even if you have not asked it.

Another thing to try is GROUP BY – Sphinx executes GROUP BY in fixed memory which means results may be approximate – this is geared towards full text search applications when exact number is not important.

Sphinx

MySQL

Another optimization I wanted to check is the “early block reject” which should allow to quickly throw away large blocks of attributes if they do not contain any data:

Sphinx

MySQL

I would expect much larger lead in this case because of this optimization but it seems to be broken in the tested version.

Also note the result set difference – Sphinx finds no rows and creates no groups while MySQL reports NULL group as a result.

SphinxQL at this point is rather picky – it wants AS for all the expressions, it also could not parse some queries for no reason though I expect these things to be polished in the near future. The good thing is the query execution maps to the same execution engine which is quite stable which means it will likely stabilize soon.

Sphinx also offers number of extensions to the SQL which are helpful for search use cases – WITHIN GROUP ORDER BY allows to select which item to pick within given group (like if you want to show most recent document, or most relevant) and others.

You might find using Native API more feature full at this point but command line language is very helpful for testing and debugging purposes as well as so Sphinx can be accessed from languages which doe not have native Sphinx API implemented – everyone seems to be able to talk to MySQL these days.

Now on performance – for given class of queries Sphinx was just 1.5-2 times faster. I honestly hoped for more, though I carefully picked queries which are reasonably good for both of them – it is easy to “break” MySQL making it to do group by with on disk temporary table which will make Sphinx much faster and few others.

The true gain from Sphinx however comes from its ability to scale almost linearly using multiple CPU cores and multiple nodes in the system. The raw scan speed was almost 10 millions of rows per second (this is on rather outdated CPU I used for testing) – this means you should be able to scan through 100M+ rows on the single modern 8 core server which is quite a number.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Mark Callaghan

Sphinx doesn’t get enough attention from us and we probably missed this post with all of the excitement during the UC.

volto

How can you access the SphinxQL wire protocol from PHP?

volto

Nevermind, figured it out, you can just use mysql_connect() pointed to the port sphinx is listening on. Didn’t know sphinx was configured to accept mysql syntax by default.